94 research outputs found
Learning Layer-wise Equivariances Automatically using Gradients
Convolutions encode equivariance symmetries into neural networks leading to
better generalisation performance. However, symmetries provide fixed hard
constraints on the functions a network can represent, need to be specified in
advance, and can not be adapted. Our goal is to allow flexible symmetry
constraints that can automatically be learned from data using gradients.
Learning symmetry and associated weight connectivity structures from scratch is
difficult for two reasons. First, it requires efficient and flexible
parameterisations of layer-wise equivariances. Secondly, symmetries act as
constraints and are therefore not encouraged by training losses measuring data
fit. To overcome these challenges, we improve parameterisations of soft
equivariance and learn the amount of equivariance in layers by optimising the
marginal likelihood, estimated using differentiable Laplace approximations. The
objective balances data fit and model complexity enabling layer-wise symmetry
discovery in deep networks. We demonstrate the ability to automatically learn
layer-wise equivariances on image classification tasks, achieving equivalent or
improved performance over baselines with hard-coded symmetry
Convolutional Gaussian Processes
We present a practical way of introducing convolutional structure into Gaussian processes, making them more suited to high-dimensional inputs like images. The main contribution of our work is the construction of an inter-domain inducing point approximation that is well-tailored to the convolutional kernel. This allows us to gain the generalisation benefit of a convolutional kernel, together with fast but accurate posterior inference. We investigate several variations of the convolutional kernel, and apply it to MNIST and CIFAR-10, which have both been known to be challenging for Gaussian processes. We also show how the marginal likelihood can be used to find an optimal weighting between convolutional and RBF kernels to further improve performance. We hope that this illustration of the usefulness of a marginal likelihood will help automate discovering architectures in larger models
Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models
Gaussian processes (GPs) are a powerful tool for probabilistic inference over
functions. They have been applied to both regression and non-linear
dimensionality reduction, and offer desirable properties such as uncertainty
estimates, robustness to over-fitting, and principled ways for tuning
hyper-parameters. However the scalability of these models to big datasets
remains an active topic of research. We introduce a novel re-parametrisation of
variational inference for sparse GP regression and latent variable models that
allows for an efficient distributed algorithm. This is done by exploiting the
decoupling of the data given the inducing points to re-formulate the evidence
lower bound in a Map-Reduce setting. We show that the inference scales well
with data and computational resources, while preserving a balanced distribution
of the load among the nodes. We further demonstrate the utility in scaling
Gaussian processes to big data. We show that GP performance improves with
increasing amounts of data in regression (on flight data with 2 million
records) and latent variable modelling (on MNIST). The results show that GPs
perform better than many common models often used for big data.Comment: 9 pages, 8 figure
- …